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This listing of claims will replace all prior versions, and listings, of claims in the 
application. 

1 . (Previously Presented) A computer implemented method comprising: 

responsive to a first single instruction, identifying a first operand including four packed 
data elements, (r3, r2> rj and ro), and identifying a second operand including four packed 
coefficients, (w3, w2, wi and wo), generating four packed first products, (13W3, r2w2, ri wi and 
r()wo) and storing said four packed first products at a first destination identified by said first 
single instruction; 

responsive to a second single instruction, identifying a third operand including four 
packed data elements, (53, S2, si and so), and identifying a fourth operand including four packed 
coefficients, (w7, w& ws and W4), generating four packed second products, (s3w7, s2w& siws 
and so w 4) aad storing said four packed second products at a second destination identified by said 
second single instruction; and 

responsive to a third single instruction, identifying a fifth operand including the four 
packed first products and identifying a sixth operand including the four packed second products, 
generating four packed sums, (s2w6+s3W7 9 sow4+siw5 s r2 w2+r3w3, and rowo+nwi) and 

storing them at a third destination identified by said third single instruction- 

2. (Original) The method of claim 1 further comprising: 

responsive to the first single instruction, overwriting said four packed data elements in the 
first operand with said four packed first products; 

responsive to the second single instruction, overwriting said four packed data elements in 
the second operand with said four packed second products; and 

responsive to the third single instruction, overwriting said four packed first products in 
the fifth operand with said four packed sums. 
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3. (Original) The method of claim 1 further comprising: 

responsive to a fourth single instruction identifying a seventh operand including the four 
packed first products and identifying an eighth operand including the four packed second 
products, generating four packed differences, (S2W6-S3W7 > sow4-$iw5 ? r2w2-r3w3, and rowo- 
riwi) and storing them at a fourth destination identified hy said fourth single instruction. 

4. (Original) The method of claim 3, said fourth destination storing elements to represent 
saturated packed differences, (s2wg-s3W7 ? sow^siws^ r2W2~r3w3, and 

rowo-riwi). 

5. (Original) The method of claim 1, said third destination storing elements to represent , 
horizontal addition operations in a register specified by bits three through five of the third single 
instruction. 

6. (Original) The method of claim 5, said third destination storing elements to represent 
saturated arithmetic sums, (s2W6+S3w7 ? S()w4+si ws ? x2 w2+r3w3, and rowo+nwi). 

7« (Original) The method of claim 5, said third destination comprising packed 16-bit elements 
to represent said four packed sums, (S2W6+S3W7, sow4+siw5, r2 W2+OW3, and r()wo+riwi), 

8. (Original) The method of claim 5, said third destination comprising packed 32-bit elements 
to represent said four packed sums, (S2W6+S3W7, sow4+siw5^ r2 w2+rjw3, and r()wo+riwi). 

9. (Original) The method of claim 8, said third destination storing elements to represent 
horizontal floating-point arithmetic operations. 

10. (Original) A processor comprising: 

a storage area to store a first packed data operand, a second packed data operand and a 
third packed data operand; and 

an execution unit coupled to said storage area, the execution unit to execute a first single 
instruction on data elements in said first packed data operand and said second packed data 
operand to generate a plurality of data elements in a first packed data result, at least one of said 
plurality of data elements in said first packed data result being the result of an intra-add operation 
performed on a first pair of data elements of said first packed data operand and at least one other 
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of said plurality of data elements in said first packed data result being the result of an intra-add 
operation performed on a second pair of data elements of said second packed data operand, the 
execution unit to execute a second single instruction on data elements in said third packed data 
operand and said second packed data operand to generate a plurality of data elements in a second 
packed data result, at least one of said plurality of data elements in said second packed data result 
being the result of an intra-subtract operation performed on a third pair of data elements of said 
third packed data operand and at least one other of said plurality of data elements in said second 
packed data result being the result of an intra-subtract operation performed on the second pair of 
data elements of said second packed data operand. 

1 1. (Original) The processor of claim 10, wherein each of said plurality of data elements in said 
first packed data result being the result of an intra-add operation with signed saturation. 

12. (Original) The processor of claim 11, wherein each of said plurality of data elements in said 
second packed data result being the result of an intra-subtract operation with signed saturation. 

13. (Original) The processor of claim 10, the execution unit, in response to said first single 
instruction, overwriting said first packed data operand with said first packed data result. 

14. (Original) A apparatus comprising: 

a first storage area for storing a first packed data operand, containing at least an A data 
element and a B data element packed together; 

a second storage area for storing a second packed data operand containing at least a C 
data element and a D data element packed together; and 

an arithmetic circuit responsive to execution of a first single instruction to add the A data 
element and the B data element to generate a Jfibret result element of a third packed data, and to 
add the C data element and the D data element to generate a second result element of the third 
packed data, the arithmetic circuit responsive to execution of a second single instruction to 
subtract the A data element and the B data element to generate a third result element of a fourth 
packed data, and to subtract the C data element and the D data element to generate a fourth result 
element of the fourth packed data. 
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15. (Original) The apparatus of claim 14 wherein each of said plurality of data elements in said 
third packed data and said fourth packed data are the result of an intra-add operation or an intra- 
subtract operation with signed saturation. 

16. (Original) The apparatus of claim 14 further comprising: 

a decoder to decode said first single instruction and said second single instruction and to 
enable execution of said first single instruction and said second single instruction; and 

a register file comprising said first storage area and said second storage area, to provide 
the A data element, the B data element, the C data element and the D data element responsive to 
the execution of said first or second single instruction. 

17. (Original) A system comprising: 

a first storage area for storing a first packed data operand, containing at least an A data 
element and a B data element packed together; 

a second storage area for storing a second packed data operand containing at least a C 
data element and a D data element packed together; 

a third storage area for storing a third packed data operand containing at least a E data 
element and a F data element packed together; 

a decoder to decode a first single instruction and to enable execution of said first single 
instruction, the decoder to decode a second single instruction and to enable execution of said 
second single instruction; 

an arithmetic circuit responsive to enabling execution of said first single instruction to 
add the A data element and the B data element to generate a first result element of a fourth 
packed data, and to add the C data element and the D data element to generate a second result 
element of the fourth packed data, the arithmetic circuit responsive to enabling execution of said 
second single instruction to subtract the E data element and the F data element to generate a third 
result element of a fifth packed data, and to subtract the C data element and the D data element to 
generate a fourth result element of the fifth packed data; 
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a wireless communication device to send and receive digital data over a wireless network; 

a memory to store digital data and software including the first and second single 
instructions and to supply the first and second single instructions to said decoder; and 

an input output system responsive to said software to interface with the wireless 
communication device receiving data to process or sending data processed at least in part by said 
first and second single instructions. 

1 8. (Original) The system of claim 17, wherein each of said first and second result elements of 
the fourth packed data are the result of an intra-add operation with signed saturation. 

19. (Original) The system of claim 18, wherein each of said third and fourth result elements of 
the fifth packed data are the result of an intra-subtract operation with signed saturation. 

20. (Original) The system of claim 17, wherein each of said first and second result elements of 
the fourth packed data and said third and fourth result elements of the fifth packed are 16-bit 
results. 

21. (Original) The system of claim 17, wherein each of said first and second result elements of 
the fourth packed data and said third and fourth result elements of the fifth packed are 32-bit 
results. 

22. (Original) The system of claim 21, wherein each of said first and second result elements of 
the fourth packed data and said third and fourth result elements of the fifth packed data are 
floating point results. 

23. (Original) The system of claim 21, wherein each of said first and second result elements of 
the fourth packed data and said third and fourth result elements of the fifth packed data are 
unsaturated signed integer results. 

24. (Previously Presented) An article of manufacture including one or more recordable media 
having executable instructions stored thereon including a first instruction and a second 
instruction which, when executed by a processing device, cause the processing device to: 

access a first packed data operand, containing at least an A data element and a B data 
element packed together at a first storage area; 
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access a second packed data operand containing at least a C data element and a D data 
element packed together at a second storage area; 

add the A data element and the B data element to generate a first result element of a third 
packed data in response to said first instruction; 

add the C data element and the D data element to generate a second result element of the 
third packed data in response to said first instruction; 

access a fourth storage area for storing a fourth packed data operand containing at least a 
E data element and a F data element packed together; 

access the second packed data operand containing at least the C data element and the D 
data element packed together at the second storage area; 

subtract the E data element and the F data element to generate a third result element of a 
fifth packed data in response to said second instruction; and 

subtract the C data element and the T> data element to generate a fourth result element of 
the fifth packed data in response to said first instruction. 

25. (Previously Presented) The article of manufacture of claim 24 which, when executed by a 
processing device> further cause the processing device to: 

overwrite said first packed data operand with said third packed data in response to said 
first instruction. 

26. (Previously Presented) The article of manufacture of claim 24 which, when executed by a 
processing device, further cause the processing device to: 

overwrite said fourth packed data operand with said fifth packed data in response to said 
second instruction. 

27. (Previously Presented) The article of manufacture of claim 24, wherein each of said first 
and second result elements of the third packed data and said third and fourth result elements of 
the fifth packed data are saturated signed values. 

28. (Previously Presented) The article of manufacture of claim 24, wherein each of said first 
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and second result elements of the third packed data and said third and fourth result elements of 
the fifth packed data are 16-bit integer values. 

29. (Previously Presented) The article of manufacture of claim 24, wherein each of said first 
and second result elements of the third packed data and said third and fourth result elements of 
the fifth packed data are 32-bit integer values. 

30. (New) The method of claim 1, wherein the first operand comprises a unit of data that 
consists of the four packed data elements, (r3, r2, rl and rO) which are all of the same size. 

3 1 . (New) The method of claim 1, wherein the first operand comprises a unit of data that is 
fully populated with the four packed data elements, (r3, r2, rl and rO) which are all of the same 
size. 

32* (New) The apparatus of claim 14, wherein each of the first and second packed data * 
operands comprises a unit of data that consists of a plurality of data elements of the same size. 

33. (New) The apparatus of claim 14, wherein each of the first and second packed data 
operands comprises a unit of data that is fully populated with a plurality of data elements of the 
same size. 



DkcNo. 42P15770 



App.No. 10/611,326 



PAGE 1 2/13 * RCVD AT 12120/2007 5:54:13 PM pastern Standard Time] * SVR:USPTO-EFXRF-5/41 * DNIS:2738300 1 CSID:3037406962 * DURATION (mm-ss):04-06 



